Towards a Unified Graph Model for Supporting Data Management and Usable Machine Learning

نویسندگان

  • Guoliang Li
  • Meihui Zhang
  • Beng Chin Ooi
چکیده

Data management and machine learning are two important tasks in data science. However, they have been independently studied so far. We argue that they should be complementary to each other. On the one hand, machine learning requires data management techniques to extract, integrate, clean the data, to support scalable and usable machine learning, making it user-friendly and easily deployable. On the other hand, data management relies on machine learning techniques to curate data and improve its quality. This requires database systems to treat machine learning algorithms as their basic operators, or at the very least, optimizable stored procedures. It poses new challenges as machine learning tasks tend be iterative and recursive in nature, and some models have to be tweaked and retrained. This calls for a reexamination of database design to make it machine learning friendly. In this position paper, we present a preliminary design of a graph model for supporting both data management and usable machine learning. To make machine learning usable, we provide a declarative query language, that extends SQL to support data management and machine learning operators, and provide visualization tools. To optimize data management procedures, we devise graph optimization techniques to support a finer-grained optimization than traditional tree-based optimization model. We also present a workflow to support machine learning (ML) as a service to facilitate model reuse and implementation, making it more usable and discuss emerging research challenges in unifying data management and machine learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Learning and Citizen Science: Opportunities and Challenges of Human-Computer Interaction

Background and Aim: In processing large data, scientists have to perform the tedious task of analyzing hefty bulk of data. Machine learning techniques are a potential solution to this problem. In citizen science, human and artificial intelligence may be unified to facilitate this effort. Considering the ambiguities in machine performance and management of user-generated data, this paper aims to...

متن کامل

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application develop...

متن کامل

Efficiency evaluation of wheat farming: a network data envelopment analysis approach

Traditional data envelopment analysis (DEA) models deal with measurement of relative efficiency of decision making units (DMUs) in which multiple-inputs consumed to produce multiple-outputs. One of the drawbacks of these models is neglecting internal processes of each system, which may have intermediate products and/or independent inputs and/or outputs. In this paper some methods which are usab...

متن کامل

Factors influencing the adoption of E-learning in Tabriz University of Medical Sciences

Background: Electronic Learning (E-learning), is the use of electronic technology in education via computer and the internet. Despite its slow adoption by faculty members, e-learning provides several benefits to individuals and organizations. This study was conducted to determine the factors influencing the adoption of e-learning by faculty members in Tabriz University of Medical Sciences. &...

متن کامل

Towards rapidly developing database-supported machine learning applications

The development of a big data analytics application benefits from a conceptual model that jointly represents aspects about data management as well as machine learning. We demonstrate a recently proposed method to translate a Bayesian network into a usable entity relationship model using the real world example of the TopicExplorer system. TopicExplorer is an interactive web application for text ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2017